Skip to content

fix(backups): reject standby backup actions and harden async replication#1587

Draft
marceloneppel wants to merge 4 commits intomainfrom
fix/1329-standby-backup-guard-async-repl
Draft

fix(backups): reject standby backup actions and harden async replication#1587
marceloneppel wants to merge 4 commits intomainfrom
fix/1329-standby-backup-guard-async-repl

Conversation

@marceloneppel
Copy link
Copy Markdown
Member

@marceloneppel marceloneppel commented Apr 6, 2026

Issue

In async replication setups, backup actions on standby clusters should fail with clear guidance to run on the primary cluster. Today those actions can still run and fail later with less actionable errors.

Also, create-replication could fail when the relation exists but remote units have not published their addresses yet, and relation-changed re-emission could run before any related unit exists.

Solution

  • Add standby-cluster detection in PostgreSQLBackups and fail early for create-backup, list-backups, and restore with explicit action-specific messages.
  • Harden async replication relation handling by failing early when the relation has no remote units yet, and by guarding relation-changed re-emission when there are no related units.
  • Add a Juju3/Ceph-backed integration test that verifies backup works on the primary cluster and is rejected on the standby cluster with the expected message.
  • Update the microceph fixture to use a non-loopback host IP for TLS SAN/certificate usage (this allows local testing in more environments).

Checklist

  • I have added or updated any relevant documentation.
  • I have cleaned any remaining cloud resources from my accounts.

Port of #1602.

Fail create-backup, list-backups, and restore on standby clusters with
explicit guidance to run those actions on the primary cluster.

Also guard async replication relation handling when no remote units are
present yet, preventing uncaught StopIteration during create-replication.

Add unit coverage for both fixes, add a Juju3 Ceph-backed integration test
for async replication backup behavior, and switch microceph TLS setup to a
non-loopback host IP.

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
@github-actions github-actions Bot added the Libraries: OK The charm libs used are OK and in-sync label Apr 6, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 6, 2026

Codecov Report

❌ Patch coverage is 86.66667% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 75.97%. Comparing base (c7a4832) to head (15c1b15).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
src/relations/async_replication.py 69.23% 2 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1587      +/-   ##
==========================================
+ Coverage   75.85%   75.97%   +0.12%     
==========================================
  Files          17       17              
  Lines        4489     4516      +27     
  Branches      687      694       +7     
==========================================
+ Hits         3405     3431      +26     
+ Misses        850      849       -1     
- Partials      234      236       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@marceloneppel marceloneppel added the bug Something isn't working as expected label Apr 7, 2026
Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working as expected Libraries: OK The charm libs used are OK and in-sync

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant